On the Linear Algebraic Structure of Distributed Word Representations
نویسنده
چکیده
In this work, we leverage the linear algebraic structure of distributed word representations to automatically extend knowledge bases and allow a machine to learn new facts about the world. Our goal is to extract structured facts from corpora in a simpler manner, without applying classifiers or patterns, and using only the co-occurrence statistics of words. We demonstrate that the linear algebraic structure of word embeddings can be used to reduce data requirements for methods of learning facts. In particular, we demonstrate that words belonging to a common category, or pairs of words satisfying a certain relation, form a low-rank subspace in the projected space. We compute a basis for this low-rank subspace using singular value decomposition (SVD), then use this basis to discover new facts and to fit vectors for less frequent words which we do not yet have vectors for. This thesis represents my own work in accordance with university regulations.
منابع مشابه
A Universal Investigation of $n$-representations of $n$-quivers
noindent We have two goals in this paper. First, we investigate and construct cofree coalgebras over $n$-representations of quivers, limits and colimits of $n$-representations of quivers, and limits and colimits of coalgebras in the monoidal categories of $n$-representations of quivers. Second, for any given quivers $mathit{Q}_1$,$mathit{Q}_2$,..., $mathit{Q}_n$, we construct a new quiver $math...
متن کاملDeformation of Outer Representations of Galois Group
To a hyperbolic smooth curve defined over a number-field one naturally associates an "anabelian" representation of the absolute Galois group of the base field landing in outer automorphism group of the algebraic fundamental group. In this paper, we introduce several deformation problems for Lie-algebra versions of the above representation and show that, this way we get a richer structure than t...
متن کاملSome algebraic properties of Lambert Multipliers on $L^2$ spaces
In this paper, we determine the structure of the space of multipliers of the range of a composition operator $C_varphi$ that induces by the conditional expectation between two $L^p(Sigma)$ spaces.
متن کاملAn Analysis of the RC4 Family of Stream Ciphers against Algebraic Attacks
To date, most applications of algebraic analysis and attacks on stream ciphers are on those based on linear feedback shift registers (LFSRs). In this paper, we extend algebraic analysis to non-LFSR based stream ciphers. Specifically, we perform an algebraic analysis on the RC4 family of stream ciphers, an example of stream ciphers based on dynamic tables, and investigate its implications to pot...
متن کاملUnsupervised Text Normalization Using Distributed Representations of Words and Phrases
Text normalization techniques that use rule-based normalization or string similarity based on static dictionaries are typically unable to capture domain-specific abbreviations (custy, cx → customer) and shorthands (5ever, 7ever → forever) used in informal texts. In this work, we exploit the property that noisy and canonical forms of a particular word share similar context in a large noisy text ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1511.06961 شماره
صفحات -
تاریخ انتشار 2015